If a random variable follows a normal distribution: \(X \sim N(\mu,\sigma^2)\)
Properties
\[ f_X\left( x \right) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]
The Binomial distribution approximates the normal distribution when \(np\) and \(n(1-p)\) are large enough (usually \(\ge 10\)).
Let \(Z_1, \dots, Z_k\) be independent standard Normal variables. Let \(V=\sum_{i=1}^k Z_i^2\). Then \(V\) follows a \(\chi^2\) distribution with \(k\) degrees of freedom: \(V \sim \chi^2(k)\).
Properties
The Normal distribution is the conjugate prior for the mean \(\mu\) (given known \(\sigma^2\)). Consider IID observations \(x_i \sim N(\mu, \sigma^2)\) where \(\sigma^2\) is known. If the prior is \(\mu \sim N(\mu_0,\sigma^2_0)\):
Posterior: \(\mu' \sim N\left(\frac{\frac{n\bar{x}}{\sigma^2}+\frac{\mu_0}{\sigma^2_0}}{\frac{n}{\sigma^2}+\frac{1}{\sigma^2_0}}, \left(\frac{n}{\sigma^2}+\frac{1}{\sigma^2_0}\right)^{-1}\right)\)
The posterior predictive distribution follows \(p(\tilde{x}|X)\sim N(\mu', \sigma^{2'} + \sigma^2)\)
Interpretation: The posterior mean is a weighted average of the prior mean and the sample mean, weighted by their respective precisions.
The Gamma distribution is the conjugate prior for the precision \(\lambda = \frac{1}{\sigma^2}\). Consider IID observations \(x_i \sim N(\mu, \sigma^2)\) where \(\mu\) is known. Let the prior be \(\lambda \sim \text{Gamma}(\alpha, \beta)\).
Posterior: \(\lambda \sim \text{Gamma}\left(\alpha + \frac{n}{2}, \beta + \frac{\sum_{i=1}^n (x_i - \mu)^2}{2}\right)\)
Interpretation: The prior acts as if we had \(2\alpha\) prior observations with a sum of squared deviations equal to \(2\beta\).
The posterior predictive distribution follows a non-standardized Student’s t-distribution.
This section covers properties of probability bounds and convergence. These are helpful for:
On the other hand, the mathematical proof for this part is relatively simple (and brings little insight to our practice) So we will only cover the intuition and implication of these properties, and leave the mathematical proof to the tutorial questions.
\[ P(X>t)\leq\frac{\mathbb{E}(X)}{t} \]
Implication
Example question If the average survival time for a specific cancer is 5 years, what is the upper bound of the probability of a patient surviving more than 20 years?
0.25 = 5/20
\[ P(|X-\mu|\geq t)\leq\frac{\sigma^2}{t^2} \]
or in the other words
\[ \small P(|Z|\geq k)\leq\frac{1}{k^2}, \text{ where } Z=(X-\mu)/\sigma \]
A clear bound for outliers:
| Deviations (k) | Chebyshev Guarantee (Minimum) | Normal Distribution (Actual) |
|---|---|---|
| 2 σ | At least 75% inside | 95.4% inside |
| 3 σ | At least 89% inside | 99.7% inside |
| 5 σ | At least 96% inside | 99.9999% inside |
Convergence studies the behavior for a series of IID random variables \(X_1, X_2, \dots, X_n\) with CDF \(F_n\), while Let \(X\) be another r.v.s, with CDF \(F\). There are three types of convergence
Let \(X_1, \dots, X_n\) be IID with \(\mathbb{E}(X_i) = \mu\) and \(\mathbb{V}(X_i) = \sigma^2\). The sample mean is \(\bar{X}_n = \frac{1}{n}\sum X_i\).
WLLN states
\(\bar{X}_n \xrightarrow{P} \mu\) as \(n \to \infty\).
Bayesian perspective of probability
Theorm
Let \(X_1, X_2, \dots, X_i\) be a sequence of iid random variables with mean \(\mu\) and variance \(\sigma^2\). The sample mean of the observarion \(\bar{X}_n=\frac{\sum_i^n X_i}{n}\), then
\[ \small \bar{X}_n \leadsto \text{Normal}(\mu, \frac{\sigma^2}{n}) \]
Implication
Define \[ \small Z = \frac{(\bar{X}_n - \mu)}{\sigma / \sqrt{n}} . \]
then the following statements are equivalent to CLT \[\begin{aligned} \small Z \leadsto N(0,1) \\ \bar{X}_n -\mu \leadsto N(0, \frac{\sigma^2}{n})\\ \sqrt{n}(\bar{X}_n -\mu) \leadsto N(0, \sigma^2)\\ \end{aligned} \]
Tip: Any proper algebraic transformation applied to both sides results in a valid equivalent statement.
When the sample size is extremely large, almost any tiny difference becomes “statistically significant” (\(p < 0.05\)).
Reason
Implication
We try to estimate the true \(\mu\) from a population \(N(0.2, 0.2^2)\).